The data set is quiet well structured and the quality of the data set is good. However, there are a few problematic entries and the following section will remove these entries and modify the data set such that the data can be used easily for further analysis.
| Job Title | Location |
|---|---|
| Genomic Data Scientist | Stevenage, United Kingdom |
| Scientist, Data, Methods and Analytics Immuno-inflammation and Specialty Medicines | Stevenage, United Kingdom |
| Scientist in Data, Methods, & Analytics | Brentford, United Kingdom |
| Lead Data Analyst | Brentford, United Kingdom |
All the company names contained the ratings attached along with it eventhough it was provided in a separate column. Therefore, they were also removed.
The Size of employees column was of type character, therefore, they were converted to factors and the levels were set accordingly.
The Revenue column was all mentioned in USD, so the USD was removed from the columns and added to the column name.
The salary estimate was very messy as it contained multiple factors/ranges and there were overlapping ranges too. The estimate contained different types such as Glassdoor estimate, employer estimate and per hour estimate. This has to be separated from the estimate value for easy data usage. The estimate ranges were reconstructed so that the number of different ranges are minimised.
All the -1 values were converted to NAs
There are 21 job listings that provide salary(intervals) on a per hour basis.
Figure 1.1: Maximum and Minimum Salary comparison
Data Scientists have the highest Max salary limit and also the lowest Min Salary limit. This also shows how diverse the Data Scientist job classification can be.
Figure 1.2: Location of Business Analyst job by state
In USA, Business Analyst jobs are more popular in the state of Texas and California. The count seems to be significantly less in New York which is a very interesting observation.
Figure 1.3: Location of Data Analyst job by state
Compared to Business Analyst jobs, Data Analyst jobs are significantly lesser. Data Analyst Jobs are more popular in Texas, California and New York.
Figure 1.4: Location of Data Scientist job by state
The number of jobs for Data Scientists are comparatively higher when compared to Business and Data Analysts. This was also evident from the bar graph aove.
Figure 1.5: Ratio of different company sizes for Business Analysts
Figure 1.6: Ratio of different company sizes for Data Analysts
Figure 1.7: Ratio of different company sizes for Data Scientists
The number of startups (having lesser employee count) are higher for Business Analyst field while comapred to the rest, while Data Scientists have more oppurtunities in larger companies.
Figure 1.8: Business Analyst in various Industries
Figure 1.9: Data Analyst in various Industries
Figure 1.10: Data Scientist in various Industries
Staff Outsourcing and IT services are the major industries where these 3 job classifications are predominant.
Figure 1.11: Data Scientist in various Sectors
Figure 1.12: Data Analyst in various Sectors
Figure 1.13: Business Analyst in various Sectors
Information Technology and Business Services are the predominant sectors where wthese job classifications are required.
Figure 1.14: Maximum Salary vs Rating
This claim seems to be true based on the above graph. As it can be seen, the job ratings get higher as the salary gets higher.
Figure 1.15: Salary vs State
The salary range in California, Texas and New York are comparitively higher when compared to the rest.
Figure 1.16: Sector vs State
The sector count is higher in Texas and California when compared to the rest. This mayb also be due to the number of listings that are more in number for these 2 states.